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Abstract 

We propose a new modeling paradigm for large dimensional aggregates of stochastic systems 
by Generalized Factor Analysis (GFA) models. These models describe the data as the sum of a 
flocking plus an uncorrelated idiosyncratic component. The flocking component describes a sort 
of collective orderly motion which admits a much simpler mathematical description than the 
whole ensemble while the idiosyncratic component describes weakly correlated noise. We first 
discuss static GFA representations and characterize in a rigorous way the properties of the two 
components. For wide-sense stationary sequences the character and existence of GFA models 
is completely clarified. The extraction of the flocking component of a random field is discussed 
for a simple class of separable random fields. 

1 Introduction 

Flocking is a commonly observed behavior in gregarious animals by which many equal individuals 
tend to group and follow, at least approximately, a common path in space. The phenomenon has 
similarities with many scenarios observed in artificial/technological and biological environments 
and has been studied quite actively in recent years O [SU ETJ [1^. A few examples are described 
below. 

The mechanism of formation of flocks is also called convergence to consensus and has been 
intensely studied in the literature, see e.g. [2Tl[38lll9], and there is now a quite articulated theory 
addressing the convergence to consensus under a variety of assumptions on the communication 
strategy among agents etc.. 

In this paper we want to address a different issue: given observations of the motion of a large set 
of equal agents and assuming statistical steady state, decide whether there is a flocking component 
in the collective motion and estimate its structural characteristics. The reason for doing this is 
that the very concept of flocking implies an orderly motion which must then admit a much simpler 
mathematical description than the whole ensemble. Once the flocking component (if present) has 
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been separated, the motion of the ensemble sphts naturahy into flocking plus a random term which 
describes local random disagreements of the individual agents or the effect of external disturbances. 
Hence extracting a flocking structure is essentially a parsimonious modeling problem. 



1.1 Detection of emitters 

In this scenario we suppose there is an unknown number, say q, of emitters, each of them broadcast- 
ing radio impulse trains at a fixed common frequency. Such impulses are received by a large array 
of N antennas spread in space. The measurement of each antenna is corrupted by noise, generated 
by measurement errors or local disturbances, possibly correlated with that of neighboring antennas. 
The set up can be described mathematically, by indexing each antenna by an integer i = 1, 2, . . . , iV 
and denoting by yi{t) the signal received at time t by antenna i. Then the following model can be 
used to describe the received signal 



• Xj{t) is the signal sent by the j-ih emitter at time t; 

• fij is a coefficient related to the distance between j-th emitter and antenna i; 

• yj(t) is the disturbance affecting antenna i at time t. 

The goal is to detect the number of emitters q and possibly estimate the signal components Xj{t) 
impinging on the antenna array. 

Let y(t), x(t), y(t) denote vector valued quantities in the model ([T]) of respective dimensions 
A^, q and N. The model ([TJ can be compactly written as 



where y is the A^-dimensional random process of observables; x(t) = [xi(t) ... Xq(t)]^ is the 
unobservable vector of random signals generated by the emitters; F = {fij} G M^^"^ is an unknown 
matrix of coefficients and y is a A^-dimensional random process of disturbances, uncorrelated with 
x, describing the local disturbance on the i-th. antenna. 

Note that in the model there are several hidden (non- measurable) variables, including the 
dimension q. In our setting N is assumed to be very large; ideally we shall assume N — )• c«. 

We may identify Fx{t) as the flocking component of y{t). In a primitive statistical formulation 
all signals in the model are i.i.d. process, and the sample values {y{t)} are interpreted as random 
samples generated by a underlying static model of the form 



yi{t) = filXl{t) + ... + fiqXqit) + yi{t) , 




where: 



y(t) = Fx(t)+y(t) 



(2) 



y = i^x + y . 



(3) 
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One should observe that estimation of this model from observations {y{t)} of y, consists first of 
estimating the model parameters, say F and the covariance matrix of y but also in constructing the 
hidden random quantities x and y. The covariance matrix of y, say S S M^^^ may be obtained 
from the data by standard procedures. 

A problem leading to models of similar structure is automated speaker detection. This is the 
problem of detecting the speaking persons (emitters) in a noisy environment at any particular time, 
from signals coming from a large array of N microphones distributed in a room. Here the number 
of emitters is generally small but could be varying with time. Robustly solving this problem is 
useful in areas such as surveillance systems, and human-machine interaction. 

In the model specification it is customary to assume that the noise vector y has uncorrelated 
components. In this case the model ([3]), is a (static) Factor Analysis Model. Statistical inference 
on these models leads in general to ill-posed problems and to resolve the issue it is often imposed 
that the variances of the scalar components of y should all be equal. The problem can then be 
solved by computing the smallest eigenvalue of the covariance matrix of y, following an old idea [l^ 
which has generated an enormous literature. The assumption of uncorrelated noise and, especially, 
of equal variances is however rather unrealistic in many instances. 

1.2 Inference on gene regulatory networks 

In Systems Biology, an important task is the inference on gene regulatory networks in order to 
understand cell physiology and pathology. Genes are known to interact among each other forming 
a network, and their expression is directly regulated by few transcription factors (TFs). Typically, 
TFs and genes are modeled as two distinct networks of interactions which are able also to interact 
with each other. While methods for measuring the gene expressions using microarray data are 
extremely popular, there are still problems in understanding the action of TFs and the scientific 
community is currently working on computational methods for extraction of the action of the TFs 
from the available measurements of gene expression. To this end, a simplification of the interaction 
between genes and TFs is commonly accepted and consists in projecting the TFs network on the 
"gene space" [8j. 

Denoting by a random variable y^ the measured expression profile of the i-th gene of the 
network, usually the model ^ is also proposed in this framework. In this case: 

• The N dimensional vector y represents all the gene expressions. The experimenter can usually 
observe a large amount of genes, and it is reasonable to assume that — )• oo. 

• Each component of the random vector x is associated with a TF. The number q of TF's is a 
priori unknown; furthermore N ^ q. 

• The N X q matrix F models the strength of the TFs effect on each gene. 

• The vector y describes the interaction of connected genes. 
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Factor Analysis models (see Section II. 5p have been considered to deal with this problem, see e.g. 
|451 134j : in such a case, the vector y is assumed to have uncorrelated components. However, in 
the context of gene regulatory networks the latter assumption may be relaxed, since it is well- 
known that there are interactions among genes that are not determined by TFs. Then, a possible 
assumption is that y admits some "weak correlation" among its components. 

1.3 Modeling energy consumption 

In this example, we may want to model the energy consumption (or production) of a network of A'" 
users distributed geographically in a certain area, say a city or a region. The energy consumption 
yi{t) of user i is a random variable which can be seen as the sum of two contributions 

y,(t) = /7x(t)+y,(t). (4) 

where the term fjji.(t) represents a linear combination of q hidden variables Xj(t) which model dif- 
ferent factors affecting the energy consumption (or production) of the whole ensemble; say heating 
or air conditioning consumption related to seasonal climatic variations, energy production related 
to the current status of the economy etc. The factor vector x(t) determines the average time pat- 
tern of energy consumption/production of each unit, the importance of each scalar factor being 
determined by a g-ple of constant weight coefficients fi^k- 

One may identify the component Fx{t) as the flocking component of the model (jH). The terms 
yi{t), represent local random fluctuations which model the consumption due to devices that are 
usually activated randomly, for short periods of time. They are assumed uncorrelated with the 
process x. The covariance Eyj(t)yj(t) could be non zero for neighbouring users but is reasonable 
to expect that it decays to zero when \i — j\ — )■ oo. 

To identify such a model one should start from real data of energy consumption collected from a 
large amount of units. A possible application for such a model is the forecasting of the average 
requirement of energy in a certain geographical area. 

1.4 Dynamic modeling in computer vision 

Large-dimensional time series occur often in signal processing applications, typically for example, 
in computer vision and dynamic image processing. The role of identification in image processing 
and computer vision has been addressed by several authors. We may refer the reader to the recent 
survey [13] for more details and references. One starts from a signal y(i) := vec(I(-,i)), obtained 
by vectorizing at each time t, the intensities I(-,t) at each pixel of an image, into a vector, say 
y{t) £ M^, with a "large" number (typically tens of thousands) of components. We may for instance 
be interested in modeling (and in identification methodologies thereof) of "dynamic textures" (see 
|20j). by linear state space models or in extracting classes of models describing rigid motions of 
objects of a scene. Most of these models involve hidden variables, say the state of linear models of 
textures, or the displacement-velocity coordinates of the rigid motions of objects in the scene. The 
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purpose is of course to compress high dimensional data into simple mathematical structures. Note 
that the number of samples that can be used for identification is very often of the same order (and 
sometimes smaller) than the data dimensionality. For instance, in dynamic textures modeling, the 
number of images in the sequences is of the order of a few hundreds while N (which is equal to the 
number of pixels of the image) is certainly of the order of a few hundreds or thousands [2^ E] • 

1.5 Factor Analysis 

Factor Analysis has a long history; it has apparently first been introduced by psychologists [47^ [TO] 
and successively been studied and applied in various branches of Statistics and Econometrics |31| 
[321 El [30]. With a few exceptions however, [291 [501 [JH [M [H [36] , little attention has been payed to 
these models in the control engineering community. Dynamic versions of factor models have also 
been introduced in the econometric literature, see e.g. [241 [39l [28] and references therein. 
Recently, we have been witnessing a revival of interest in these models, motivated on one hand 
by the need of modeling very large dimensional vector time series. Vector AR or ARMA models 
are inadequate for modeling signals of large cross-sectional dimension, because they involve a huge 
number of parameters to estimate which may sometime turn out to be larger than the sample 
size. On the other hand, an interesting generalization of dynamic factor analysis models allowing 
the cross-sectional dimension of the observed time series to go to infinity, has been proposed by 
Chamberlain, Rothschild, Forni, Lippi and collaborators in a series of widely quoted papers [T2l 
[22] 123] . This new modeling paradigm is attracting a considerable attention also in the engineering 
system identification community [17 1 [3l \T8 [ [^j. These models, called Generalized Dynamic Factor 
Models are motivated by economic and econometric applications. We shall argue that, with some 
elaboration, they may be quite useful also in engineering applications. 

1.6 Problem statement and scope of the paper 

Notations: in this paper boldface symbols will normally denote random arrays, either finite or 
infinite. All random variables will be real, zero-mean and with finite variance. In the following we 
shall denote by the symbol H{v) the standard inner-product space of random variables linearly 
generated by the scalar components {vi, . . . , v„, . . .} of a (possibly infinite) random string v. 

Let y{k, t) be a second order finite variance random field depending on a space variable k and 
on a time variable t. The variable k is indexing a large ensemble of space locations where equal 
"agents" produce at each time t the measurement, y{k,t), of a scalar quantity, say the received 
voltage signal of the k-th antenna or the expression level of the k-th cell in a cell array. We shall 
assume that k varies on some ordered index set of elements and let t G Z or Z^, depending on 
the context. Eventually we shall be interested in problems where A^ = oo. We shall denote by y(t) 
the random (column) vector with components {y{k,t); k = 1,2, . . . , N}. Suitable mathematical 
assumptions on this process will be specified in due time. 

A (random) flock is a random field having the multiplicative structure y{k, t) = Y2i=i 
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or equivalently, 

y{t) = j2 fMt) (5) 

i=l 

where fi = /j(2) ... fi{N)\^ , i = l,2,...,q are nonrandom A^-vectors and x(t) := 

xi(t) . . . Xg(t)]^ is a random processes with orthonormal components depending on the time 
variable only; i.e. Ex(t)x(t)'^ — ^ t G Z . 

The idea is that a flock is essentially a deterministic geometric configuration of points in a q- 
dimensional space moving rigidly in a random fashion. The main goal of this paper is to investigate 
when a second order random field has a flocking component and study the problem of extracting it 
from sample measurements of y(A;, t). This means that one should be searching for decompositions 
of the type: 

y(t) = ^/,x,(t)+y(t) (6) 

i=l 

where q > 1 and y{t) is a "random noise" field which should not contain flocking components. 
Naturally for the problem to be well-defined one has to specify conditions making this decomposition 
unique. 

A generalization of this setting where y may take vector values is possible, but for the sake of 
clarity we shall here restrict to scalar-valued processes. 

The organization of the paper is as follows: In Section [2] we review static finite-dimensional Factor 
Analysis; in Section [3] we discuss the basic ideas leading to representations of infinite dimensional 
strings of variables by Generalized Factor Analysis (GFA) models. The problem of representation 
by GFA models is discussed in Section HI The restriction to stationary sequences is discussed in 
Section [5l the relation of GFA with the Wold decomposition, the main theme of this section is 
believed to be completely original. Also original is the content of Section [6] where the extraction of 
the flocking component for a class of space-time random is flnally discussed. 

Some of the material of this paper has been presented in a preliminary form to conferences [7} 142] . 

2 A review of Static Factor Analysis models 

A (static) Factor Analysis model is a representation 

y = Fx + e, (7) 

of observable random variables y = [yi ... yA?]^, as linear combinations of q common factors 
X = [xi . . . Xq]"'', plus uncorrelated "noise" or "error" terms e = [ei . . . e^v]^. An essential part 
of the model speciflcation is that the components of the error e should be (zero-mean and) 
mutually uncorrelated random variables, i.e. 

Exe^ = 0, Eee^ = di&g{al,...,a%}. (8) 
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The aim of these models is to provide an "explanation" of the mutual interrelation between the 
observable variables in terms of a small number of common factors, in the sense that, setting: 
Yi := /j^x, where is the i-th row of the matrix F, one has exactly Eyjyj = Eyjyji ^ov all 
i ^ j. This property is just conditional orthogonality (or conditional independence in the Gaussian 
case) of the family of random variables {yi, . . . ,yjv} given x and is a characteristic property of 
the factors. It is in fact not difficult to see that y admits a representation of the type ([7]) if and 
only if X renders {yi, . . . ,yAr} pairwise conditionally orthogonal given x, [411 U]. We stress that 
conditional orthogonality given x is actually equivalent to the orthogonality (uncorrelation) of the 
components of the noise vector e. 

Unfortunately these models, although providing a quite natural and useful data compression 
scheme, in many circumstances, suffer from a serious non-uniqueness problem. In order to clarify 
this issue we first note that the property of making {yi, . . . ,yAr} conditionally orthogonal is really 
a property of the subspace of random variables linearly generated by the components of the vector 
y := Fx, denoted X := H{y) and it will hold for any set of generators of X. Any set of generating 
variables for X can serve as a common factors vector and there is no loss of generality to choose the 
generating vector x for X of minimal cardinality (a basis) and normalized, i.e. such that E xx^ = /, 
which we shall always do in the following. A subspace X making the components of y conditionally 
independent is called a splitting subspace for {yi, . . . , yjv}. The so-called "true" variables y^ are 
then just the orthogonal projections yi = E [y^ | X]. 

We may then call q = dimx = dimX the dimension of the model. Hence a model of dimension 
q will automatically have rankF = q as well. Two F.A. models for the same observable y, whose 
factors span the same splitting subspace X are equivalent. This is a trivial kind of non-uniqueness 
since two equivalent F.A. models will have factor vectors related by a real orthogonal transformation 
matrix. 

The serious non-uniqueness comes from the fact that there are in general many (possibly in- 
finitely many) minimal splitting subspaces for a given family of observables {yi, . . . ,yi^}. This is 
by now well known [41^ I35j. Hence there are in general many nonequivalent minimal F.A. models 
(with normalized factors) representing a fixed A^-tuple of random variables y. For example, one can 
choose, for each /c G {1, . . . , A^}, a splitting subspace of the form X := span { yi ... yk-i Yk+i ■ ■ ■ Yn }, 
and thereby obtain N "extremal" F.A. models called elementary regressions which are clearly non 
equivalent. 

Note that a Factor Analysis representation induces a decomposition of the covariance matrix E of 
y as 

E = FF'^ + dmg{al,...,alj (9) 

which can be seen as a special kind of low rank plus sparse decomposition of a covariance matrix 
|13j . a diagonal matrix being, in intuitive terms, as sparse as one could possibly ask for. 

The inherent nonuniqueness of F.A. models is called "factor indeterminacy", or unindentifi- 
ability in the literature and the term is usually referred to parameter unidentifiability as it may 
appear that there are always "too many" parameters to be estimated. It may be argued that once 
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a model, in essence, a splitting subspace, is selected, it can always be parametrized in a one-to-one 
(and hence identifiable) way. Unfortunately, the classification of all possible minimal F.A. represen- 
tations and an explicit characterization of minimality are, to a large extent, still an open problem. 
The difficulty is indeed a serious one. 

Since, as we have argued, in essence non-uniqueness is just a consequence of uncorrelation of the 
noise components, one may try to get uniqueness by giving up or mitigating the requirement of 
uncorrelation of the components of e. This however tends to make the problem ill-defined as the 
basic goal of uniquely splitting the external signal into a noiseless component plus "additive noise" 
is made vacuous, unless some extra assumptions are made on the model and on the very notion of 
"noise" . Quite surprisingly, as we shall see, for models describing an infinite number of observables 
a meaningful weakening of the uncorrelation property can be introduced, so as to guarantee the 
uniqueness of the decomposition. 

3 Aggregate and idiosyncratic sequences 

In this section we shall review the main ideas of Generalized Factor Analysis, drawing quite heavily 
on the papers [12^ [23] although with some non-trivial original contributions. We shall restrict for 
now to the static case. 

Consider a zero-mean finite variance stochastic process y := {y(A;), k G which we shall 

normally represent as a random column vector with an infinite number of components. The index 
k will later have the interpretation of a space variable. We shall normally work on the standard 
Hilbert space H{y) linearly generated by the components of y and convergence shall always mean 
convergence in the norm topology of this space. 

We want to describe the process as a linear combination of a finite number of common random 
components plus " noise", i.e. 



where the random variables Xj , z = 1, . . . , g are the common factors and the deterministic vectors 
fi are the factor loadings. The Xj can be taken, without loss of generality, to be orthonormal 
so as to form a q-dimensional random vector x with E xx^ = Iq. The y(fe)'s are zero mean 
random variables orthogonal to x. We shall list the linear combinations y(A;) := ^ fi{k)xi as the 
components of an infinite random vector y and likewise for the noise terms y(A;) so that (jlOp can 
be written y = y + y for short. Which specific characteristics qualify the process y as "noise" is a 
nontrivial issue which will be one of the the main themes of this section and will be made precise 
later (see the definition of idiosyncratic noise below). 

The infinite covariance matrix of the vector y is formally written as S : — lEj^yy"'"}. We let 
indicate the top-left n x n block of S, equal to the covariance matrix of the first n components of 
y, the corresponding n-dimensional vector being denoted by y". The inequality S > means that 




A: = 1,2,... 



(10) 
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all submatrices S„ of S are positive definite, which we shall always assume in the following. 
Letting S := Eyy^ and S := Eyy"'", the orthogonality of the noise term and the factor components 
implies that 

S = S + S, (11) 

that is, S„ = S„ + 1]„ , Vn € N . Even imposing E of low rank, this is a priori a highly non unique 
decomposition. There are situations/examples in which the S is diagonal as in the static Factor 
Analysis case, but these situations are exceptional. 

Let denote the Hilbert space of infinite sequences a := {a{k), k G Z_|_} such that \\a\\f~ := 

a^Sa < DO. When S = I we use the standard symbol denoting the corresponding norm by 

Definition 3.1 (Forni, Lippi) A sequence of elements {an}ngz+ C i'^ Ci ^^(S) is an averaging 
sequence (AS) for y, ifliuin^oo Wanh = 0. 

We say that a sequence of random variables y is idiosyncratic i/lim„_!.oo = /o'" '^'^2/ aver- 
aging sequence an £ i'^ r\ ^^(E). 

Whenever the covariance S is a bounded operator on l'^ one has ^2(S) c f; in this case an AS 
can be seen just as a sequence of linear functionals in i"^ converging strongly to zero. 
Examples : The sequence of elements in 

^' ^ (12) 



n 



is an averaging sequence for any E. On the other hand, let Pn denote the compression of the n-th 
power of the left shift operator to the space 1?^ i.e. [P„a](A;) = a(k — n) for k > n and zero otherwise. 
Then lim„_s.oo PnO' = for all a G j26j so that {PnCi}nez+ is an AS for any a € i"^. 

Let l be an infinite column vector of I's and let x be a scalar random variable uncorrelated 
with y, a zero-mean weakly stationary ergodic sequence. Consider the process 

y = Ix + y 

and the averaging sequence p^ . Since lim„_>oo ~Z]fc=iy(^) — ll^y(^) = (limit in L^) we have 

limn-^-oo — Ylk=i y(^) ~ ^ i hence we can recover the latent factor by averaging. More generally, if 

y is idiosyncratic lim^-s-oo y = for any averaging sequence and one could recover x from AS's 
such that lim„_j.oo 1 exists and is non zero. □ 

The following definition is meant to capture the phenomenon described in the example. 

Definition 3.2 Let z G H{y)- The random variable z is an aggregate (of y) if there exists an AS 
{a„} such that lim„_j.oo a^y ~ ^- aggregate random variables in H{y) is denoted by 
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It is straightforward to check that ^(y) is a closed subspace. It is called the aggregation subspace 
of H(y). Clearly, if y is an idiosyncratic sequence then ^(y) = {0}. In general it is possible to 
define an orthogonal decomposition of the type 

y = E[y |g(y)]+u, (13) 

where all components u{k) are uncorrelated with ^(y). The idea behind this decomposition is 
that, in case ^(y) is finite dimensional, say generated by a q-dimensional random vector x, one 
may naturally capture a unique decomposition of y of the type ([10]) . 

Unfortunately however, in general ^(y) = {0} does not imply that y is idiosyncratic. See the 
example below, inspired to a similar one in |23j . 

Example 3.1 Consider a sequence y with y(j)_Ly(/i) V j 7^ h (a possibly non- stationary white 
noise), and let z be an aggregate random variable, so that there is an AS {an} such that 



z= lim a^y = lim Va„(j)yj. (14) 

Note that, being z G H{y) and y an orthogonal basis of this space, we can uniquely express z as 

00 

z = Y,b{j)yU), (15) 

J=l 

and, by uniqueness of the representation, it follows that lim„_s.oo an(j) = b{j)\/j. On the other 
hand, being an an AS, the limits of an{j) must be zero, so that b{j) = 0. Hence z = 0. Thus a 
white noise process has always G{y) = {0}. 

However if {y{k)} has unbounded variance, the sequence is not idiosyncratic. For example if 
||y(A;)|p = k, given the AS 

d„ = -^[0_^0 ...]^, (16) 

n 

we have ||djy|| = IVn. Hence in this case y is neither aggregate nor idiosyncratic. On the other 
hand, when ||y(A:)|| < M < 00 for all k, we have 

00 

ll«Iyf = Yl a-(^)'lly(^)ll' < M^hnWl ^ (17) 

for n — 7- 00. Hence a white noise process with a uniformly bounded variance (has a trivial aggrega- 
tion subspace and) is idiosyncratic. □ 
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The nature of an idiosyncratic sequence is related to certain properties of its covariance matrix. 
To explain this point, we need to introduce some notations and facts about the eigenvalues of 
sequences of covariance matrices. Denote by A„^fc(S) the k~th eigenvalue of the n x n upper left 
submatrix E„ of S. The An,fe(S) are real nonnegative and can be ordered by decreasing magnitude. 
By Weyl's theorem |48[ p. 203], see also |23t Fact M], the k-th eigenvalue of S„ is a non decreasing 
function of n and hence has a limit, Afc(S), which may possibly be equal to +00. Each such limit 
is called an eigenvalue of S. These limits however are in general not true eigenvalues, as it is 
well-known that S may not have eigenvalues. For example, a bounded symmetric Toeplitz matrix 
has a purely continuous spectrum j27J. Anyway since S is symmetric and positive, its spectrum lies 
on the positive half line and its elements can also be ordered. Henceforth we shall denote by Ai(S) 
the maximal eigenvalue of S, as defined above, with the convention that Ai(S) = +00 when there 
are infinite eigenvalues. The following result will be instrumental in understanding the structure of 
idiosyncratic processes. 

Theorem 3.1 //Ai(S) is finite, then T, is a bounded operator on ^. 

Proof : Let Ai(S„) be the maximal eigenvalue of S„. Denote the string of the first n elements of 
an infinite sequence a by a". Since 

S„ < Ai(S„)/„ < Ai(S)/„ (18) 
where is the n x n identity matrix and Ai(Il) < 00 by assumption, it follows that for all sequences 

x"E„y"< Ai(S)||x"||2||y"||2, n = l,2,... (19) 

Then the result follows from the theorem in [H p. 53]. □ 
A strong characterization of idiosyncratic sequences is stated in the following theorem, inspired by 
[23j after some obvious simplifications. For completeness we shall provide a proof. 

Theorem 3.2 The sequence y is idiosyncratic if and only if Xi{T,) is finite; equivalently, if and 
only if its covariance matrix defines a bounded operator on . 

Proof : Assume first that lim„_>oo A„^i(S) = +00. Since S^, > is symmetric it has a spectral 
represenattion 

Ujj:nUn = Dn, (20) 

where C/„ is orthonormal and Dn = diag{ An,i(S), An,n(S)}- Consider the first column of 
Un, say n", which is the eigenvector of A„,i(S) and define the sequence of elements in i'^ n ^^(S) 
constructed as 

On := \{u'iy ...V , n = l,2,.... (21) 

Since lim„_5.oo A„,i(S) = +00, this is an AS, for which 

\\alyf = j-^^iu-,V^nu'l = l (22) 
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for every n and hence the sequence y cannot be idiosyncratic. 

Conversely, suppose that Ai(S) < +00 and again use the diagonahzation (|20]) . Let a„ be an 
arbitrary AS and consider the random variable z = liuin^^a^y = lim„_^oo Q^n^y") which has 
variance 

var[z] = hm {al)" U^D^Uj := < , (23) 



n— ^-oo 



where the vector := U^a^ is used to form the first n elements of an infinite string, say d„, whose 
remaining entries are taken equal to those of a„; i.e. dn{k) = an{k) for k > n. Clearly dn is an AS. 
Since {d^)~^Dnd^ = X^^^i A„^fc(S)(i„(A;)^, one can write 

n n 

var[z]= lim V A„,fc(S)d„(A:)2 < lim Ai(S) V d„(A:)2 = lim Ai(S)||d„||i = 

i=l k=l 

which shows that y is idiosyncratic. □ 

In particular, since the covariance of a white noise process is diagonal, the covariance of a 
white noise can be bounded (and therefore y can be idiosyncratic) only if the variances ||y(A;)|p 
are uniformly bounded. This completes the discussion in Example 13.11 

Definition 3.3 Let q be a finite integer. A sequence y is purely deterministic of rank q (in short 
q-PD) if H{y) has dimension q. 

Clearly a g-PD sequence y can be seen as a (in general non-stationary) purely deterministic process 
in the classical sense of the term, see e.g. [ISJ. Let x = [xi ... Xg]"*" be an orthonormal 
basis in H{y). Obviously y is a g-PD random sequence if and only if there is 00 x g matrix 
F = [fi /2 • • • fq] , such that 

q 

y{k) =Y,Mk)^i, kGZ+, (24) 

i=l 

where the columns /i, /2, ■ ■ ■ fq must be linearly independent, for otherwise the rank of y would 
be smaller than q. 

We want to relate this concept to the idea of aggregation subspace of y, as defined earlier. In 
particular we would like to identify x as an orthonormal basis in G{y)- Quite unfortunately however, 
there are nontrivial sequences representable in the form ()24p which are idiosyncratic (or contain 
idiosyncratic sequences). See the Example below. 

Example 3.2 Consider a sequence y whose k—th element is 

y(fc) = A'=x ,|A|<1, (25) 
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where 'x. is a zero-mean random variable of positive variance . Clearly, y is 1-PD, its spanned 
subspace H{y) being the one- dimensional space H{x.). The covariance matrix of the first n compo- 
nents of y is 

A2 A3 ... A"+i" 
^ A3 A^ ... A"+2 
Eyny^ = ^ : : . . : (26) 

Since rank (S„) = 1 for every n, we have 

2 \2 

Ai(S) = lim tr(S„) = lim V A^'^ = ^— , (27) 

fc=l 

i/ius, in force of Theorem \3.2[ y is idiosyncratic. Hence there are (non-stationary) q—PD sequences 
which are idiosyncratic. 

This is a possibility which we clearly must exclude if the decomposition (jlOp has to be unique. 
The question is which properties need to be satisfied by the functions /i, /2, ... fq for y to be a q- 
aggregate sequence. One necessary condition is easily found: the fi cannot be in i"^ since otherwise 
any sequence of functionals {an} in t"^ converging to zero would lead to 

lim alf^ = (28) 

n— >oo 



SO that lim„_!.oo a„ y = as well. This is clearly the problem with Example | 
We shall call a sequence y g-aggregate if its covariance matrix has q nonzero eigenvalues, i.e. 
rank S„ = q, Vn, and lim„_j>oo ^n,k{'^) = +oo for k = 1, . . . ,q. In short, all nonzero eigenvalues of 
S are infinite. 

This condition guarantees uniqueness of the decomposition (fTO]l when y is g-aggregate and y is 
idiosyncratic. 

Proposition 3.1 A q-aggregate sequence y can be idiosyncratic only if it is the zero sequence. 

Proof : This follows trivially from Theorem 13. 2i If q > the maximal eigenvalue of the covariance 
matrix of y is +oo by definition. □ 
Of course the question is under what conditions the q eigenvalues of S may tend to infinity. Theorem 
13.31 below provides an answer. 

Definition 3.4 Let 

A":=/r-n[/f (29) 

where H is the orthogonal projection onto the Euclidean space F'^ = span{/", j 7^ i} of dimension 
q-1. 

The vectors fi, i = 1, . . . ,q in are strongly linearly independent if 

lim Ilif II2 = +00 i = l,...,q. (30) 
n— >-oo 
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In a sense, the tails of two strongly linearly independent vectors in M.°° cannot get "too close" 
asymptotically. 

Theorem 3.3 Let y be a q—PD sequence, i.e. let 



y(^) =Y,fiik)xi, fceZn 



(31) 



i=l 



then y is q— aggregate if and only if, the vectors fi, i = 1, . . . ,q are strongly linearly independent. 

Proof : First we prove the sufficiency of condition (|3Up . Let /c be a fixed positive constant and let 
/i be such that 

lim ll/r - n[/i" I = k'2 <+oo. (32) 



Let 



whence, defining F'^' 
of the form 



/r = /r - n[/r i j^i = /r - 0^2/2 c^^f^ ; (33) 

[fi f2 ■■■ fq] ' one can write F" = F'^T", with T" is a full rank matrix 



1 

-On Iq-l 



(34) 



where a„ := [ 



an 



iT 



. Since /"_L/", « 7^ 1, the Gramian matrix of is block diagonal, 



pnT pn 



\\f{ 



n\\2 











(35) 



where is a positive definite matrix whose eigenvalues tend to infinity as n increases. Note that 
the spectrum of F^~^ contains the eigenvalue ||/f |P, which, for n — )• 00, converges to A; < +00. 
Now, let us compute the trace of both sides of the identity T"(F"'^F")"ir"'^ = (F"^F")-^ 
obtaining 



tr 



nT r?n\~l 



tr 
tr 

tr 



tn I zjinT jlpn\ — lrj-\n~T 



pn^pn I p 
1 + llOnlP 



tr 



-an 
k-^{l + \\anf 



Vi 



,-1 



pn~T pn ^pn~T pn\^~\ 





A- 



-a. 
A. 



n n 
1 



fe-i(l + ||anf)+tr [A-i] 



(36) 



Since the eigenvalues of An tend to infinity, those of A^ ^ tend to zero, while, for every n we have 
k~^{l + ||a„|p) > 0. Thus, one eigenvalue of {^F^'^ F^) ^ is bounded below by a fixed constant as 
n tends to infinity. Hence we conclude that one eigenvalue of F"''^ F"' remains bounded as n tends 
to infinity, which is a contradiction. 
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For the necessity, we define /"i'"^ — [fi{ni) ... fi{n2)]'^ and observe that condition (|3U|) 
implies that 

Km ||/;^i'"-n[/;^^'"|jr^'"]||2 = +oo, (37) 

n— >-oo 

for every index i = 1, ■ ■ ■ , q and natural number ni. Moreover, by definition of limit, we have that 
for every ni S N and K £ there exists an integer n2 such that the inequality (with an obvious 
meaning of the symbols) 

holds for every i = 1, . . . , q. 

Now, consider the sequence generated by the g-th eigenvalue of the matrix F"''^ F"', say {A^ ; n G 
N}. Our goal is to show that for every natural rii and arbitrary constant c > there exists a natural 
number n2 such that A^^ > A^^ + c, so that lim„_5.oo A^ = +00. To this end, fix c and, for a generic 
ni, consider the normalized eigenvector of the q-th eigenvalue of the matrix F"'^'^F"'^, say v^^. 
Since for every n2 > ni it holds that 

pn2~\' pn2 _ pniT pni pni,n2T pni,n2 (39) 

we can write 

A^2 = yn2Tpn,Tpm^n2 ^ pn,,n2T pnun2^n2 _ (49) 

Consider the first term on the right side of this identity; expressing as a linear combination of 
the eigenvectors of F^'^~^ F"'''- , i.e. Vg^ = aiv^^ + . . . + aqVg'^, the orthogonality of these eigenvectors 
implies that 

<? 

^n2TpmTpm^n2 ^ ^ni^2 + _ _ _ + ^2 > ^ni ^ ^2 ^ ^ (4^) 

j=l 

so that 

Ag^ > A^l + ^;^2T^ni,n,2T^ni,n2^n2 ^ ^42) 

Now we have to show that we can always find an integer n2 such that the quantity 



yn2T pni,n2T pni,n2^n2 

can be chosen arbitrarily large, i.e. greater or equal to the previously fixed constant c . To this end, 
take 712 such that for every i = 1, . . . , q the inequality (j38p holds, with K = c^fq. Then, there is 

an index i such that the i-th component of the norm one vector v^^ = [wi . . . uig]"*", satisfies the 
inequality Wi > Without loss of generality we may and shall assume that i = 1. Let a2 ■ ■ - aq 
be defined as in (p3]l and set 

_^ni,„2 j.n,,n2 _ ^^jn,,n2 _____ ^^^n,,n2 ^ (43) 
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so that we have 



^ni,n2||2 
Ar, 



where T" has the same structure as in (I34p . Now, observe that 



pn^n2 



[wi — a2U>l + W2 



-aqWl + ■WqJ 



(44) 



(45) 



which imphes that (j44p is equal to u^i ||/"^'"^ |P + where Q is a positive constant. Hence, from 
i3]) we have ?;^2T^ni,n2T^ni,n2^n2 > ^ j^j^^ hence, recalling (Ii2]) . 



which proves the theorem. 



(46) 

□ 



Example 3.3 Consider the following 2—PD sequence y{k) := fi{k):x.i 
with ^ 

h{k) = l for all /2(A;)=1-Q' 

It is not difficult to check that this sequence does not satisfy condition (j30p . We shall show that 
this sequence is not 2-aggregate. The Gramian matrix of the functions /i,/2 restricted to [1, n] is 

" ii/riii (/f, /2">2i 



pnY pn 



(/r, /2")2 ll/2"lli 



anrf it can he seen that as n ^ oo, the second eigenvalue converges to |. Hence one eigenvalue of 
the covariance matrix of y is finite and the sequence is not 2-aggregate. □ 

4 GFA representations: Existence and uniqueness 

We eventually come to a precise definition of the basic object of our study. The following is the 
static version of a similar definition of |23j for the dynamic setting. 

Definition 4.1 A random sequence y is a g— factor sequence (q—FS) if it can be written as an 
orthogonal sum 

yik) = ^fiik)x^ + yik), fe = 0,1,2,... (47) 
1=1 

where y '■= Yl fi^i ^-^ ^ q-aggregate sequence and y is idiosyncratic and orthogonal to x. The 
representation ([^7|) is called a Generalized Factor Analysis (GFA) representation of y 
with q factors. 
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The crucial question is now which random sequences are g— FS. A first step is to discuss the problem 
for covariance matrices. 

Definition 4.2 The covariance T, has a GFA decomposition of rank q if it can be decomposed as 
the sum of a matrix S which is a bounded operator in l"^ and a rank q perturbation S = FF"^ where 
F G M°°^'' has strongly linearly independent columns. 

Chamberlain and Rothschild [12^ Theorem 4] provide a criterion for a GFA decomposition based 
on separating the bounded from the unbounded eigenvalues of S. The criterion has been extended 
by Forni and Lippi ^3] to the dynamic case. 

Theorem 4.1 (Chamberlain-Rothschild) // and only if for n — )• oo, S„ has q unbounded 
eigenvalues and Aq+i(S„) stays bounded, then S has a GFA decomposition of rank q: 

E = FF^ + 1 , with F=[fi ... fg] , fi£ M°° (48) 

The GFA decomposition of S is unique. 

Note that there may well be sequences (of positive symmetric) S„ for which all eigenvalues tend to 
infinity. In this case there is no GFA decomposition. When it applies, the criterion can be seen as a 
limit of the well-known rule of separating "large" from "small" eigenvalues in Principal Components 
Analysis. Let G ; i = 1, . . . , g be the eigenvectors corresponding to the q (ordered) eigenvalues 
of Tin which increase without bound when n — t- oo. We normalize these eigenvectors in such a way 
that Fn := [/f . . . /g"] yields S„ = F^FJ . Then 

lim FnFj = FF^ . (49) 

n— >oo 

Although the usual orthogonality of the /" in PCA does not make sense in infinite dimensions as 
the limit eigenvectors do not belong to i"^ , one may however interpret the strong linear independence 
condition as a limit of the orthogonality holding for finite n. Hence we can (asymptotically) get q 
and F by a limit PCA procedure on the sequence S„. 

Trivially, if a random sequence y admits a GFA representation then its covariance matrix has 
a GFA decomposition. On the other hand, assume we are given a GFA decomposition S + S of an 
infinite covariance S. How do we find the hidden variables in the representation y = Fx + y? 
We can answer this question under the constraint that both x and y belong to F[{y). Models of 
this kind are called internal in stochastic realization. 

Proposition 4.1 Assume that its covariance matrix S has a GFA decomposition of rank q. Then 
y has a GFA representation with q factors where both x and y have components in H(y). 
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Proof : By a standard Q-R factorization we can orthogonalize the columns of Fn 



[/f ■■■ fq] = [9l 



which we shall write compactly as 



92 



1 





n,2 
1 











n,3 
1 



r2,q 



(50) 



(51) 



where Q„ := [g'" §2 ■ ■ ■ gg] has orthogonal columns. It is well-known that each g'^ can be 
obtained by a sequential Gram-Schmidt orthogonalization procedure as the difference of /" with 
its projection onto the subspace span{/", j < i} C J-""- Hence \\g^\\ > and hence, by 

assumption, tends to oo when n — )• oo. 
Next, define 



\9i \\2 



[gf{l) 5r(2) ••• 5?(n) . . .] 



where the ^"'s are as defined above. Since ||5"||2 — >• oo with n, we have ||aj^n||2 = l/ll^rib 
71 — 7- oo. Hence aj_„ is an AS. 

Note that we can express each /" as /" = -|- X]j=i ^^^^ 



(52) 
as 



'^i,nfi 



||„n||2 \\9i \\2 
\\9i \\2 



(53) 



for all n large enough and by a similar calculation one can easily check that aj^fj = 0, for all j < i. 
With these aj^„ construct a sequence of q x oo matrices 



■An ■- 



(54) 



which provides an asymptotic left-inverse of F, in the sense that lim„_s.oo ^nF = R, where R is 
the limit of a sequence of q x q matrices all of which are upper triangular with ones on the main 
diagonal. Next, define the random vector z„ := j4„y which converges as n — )• oo to a ^-dimensional 
z whose components must belong to ^(y). These q components form in fact a basis for ^(y) as 
the covariance Ez^zJ converges to RR^ which is non singular. From this, one can easily get an 
orthonormal basis x, in H{y). Hence, since F is known, we can form y = Fx and letting y := y — y 
does yield a GFA representation of y inducing the given GFA decomposition of S. Uniqueness is 
then guaranteed in force of Proposition 13.11 □ 
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Short and long distance interaction 



At this point we have cohected enough information on the model structure to suggest an inter- 
pretation of GFA models. We shall imagine a scenario of an ensemble of infinitely many agents 
distributed in space generating the random variables {y{k) = y{k) + y{k) ; k = 1,2,...} and 
interacting in a random fashion. 

The idiosyncratic covariances a{k,j) = Ey(/c)y(j) measure the mutual influence of neighboring 
units noises y{k), y{j). Since S is a bounded operator in it is a known fact [1, Section 26] that 
a{k,j) — )• as |/c — j| — )• oo so in a sense the idyosincratic component y of a GFA representation 
models only short range interaction among the agents, as a{k,j) is decaying with distance. Agents 
which are far away from each other essentially do not resent of mutual influence. 
On the other hand, Ey{k)y{j) = fi{k)fi{j) and the elements of the column vectors fi cannot 
be in In particular, as stated in the proposition below, if the variances of the random variables 
y{k) are uniformly bounded fi £ i°° . 

Proposition 4.2 If y is a q—FS and has uniformly bounded variance, then the fi's are uniformly 
bounded sequences (i.e. belong to the space £°°). 

Proof : The statement follows since ||y(A;)|p < M^, which is the same as Yli=i fii.^)'^ — ^^'^ 
hence \fi{k)\ < M for all k's. □ 

Hence since the components fi{k) do not decay with distance, the products fi{k)fi{j) generically 
do not vanish when | /c — j| — )■ cx). Therefore the factor loadings describe "long range" correlation 
between the factor components and the y component of y can be interpreted as variables modeling 
the long range interaction among agents. In this sense y models a collective behavior oi the ensemble. 



5 Stationary sequences and the Wold decomposition 

As we have just seen, non-stationarity may bring in some pathologies which seem to be difficult 
to rule out. We consider now the special case in which the sequence y, defined on Z_|_, is (weakly) 
stationary; i.e. Ey(fc)y(j) = a{k — j) for A;, j > 0. Let Hj,{y) be the closed linear span of all random 
variables {y(s) ] s >k}. Introducing the remote future subspace of y: 

HM = r\^k{y), (55) 

A:>0 

the sequence of orthogonal wandering subspaces := -f^fc(y)0-ffA;+i(y) and their orthogonal direct 
sum 

^(y) = ' (56) 

fc>0 
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it is well known, see e.g. \19\ B6l [26], that one has the orthogonal decomposition 

y = y + y, y(A^)e^oo(y) y(A:) g ^(y) (57) 

for all k G Z_|_, the component y being the purely deterministic (PD), while y the purely non- 
deterministic (PND) components. The two sequences are orthogonal and uniquely determined. 
Furthermore, it is well known that y has an absolutely continuous spectrum with a spectral den- 
sity function, say Sy{uj) satisfying the log-integrability condition J log Sy{uj) duj > — oo, while the 
spectral distribution of y is singular with respect to Lebesgue measure (for example consisting only 
of jumps) possibly together with a spectral density such that J log Sy{io) du = — oo, compare e.g. 

m- 

In this section we want to give an interpretation of the decomposition (jlOp in the light of the 
Wold decomposition. First we prove the following two lemmas. 

Lemma 5.1 Let y be stationary and assume it has an absolutely continuous spectrum with a 
bounded spectral density; i.e. 

Sy{u)eL^{[-7T,7T]). (58) 

Then y is idiosyncratic. In particular, PND sequences with a bounded spectral density are idiosyn- 
cratic sequences. 

Proof : By a well known theorem of Szego [251 P-65] se also [27], S is a bounded Toeplitz operator, 
thus for any AS a„, 

llflnyll^ = llanlll = al5]a„ < ||S|| ||on||i . (59) 
and since ||an||| — )■ 0, ||aj[y|p — >• 0, and y is idiosyncratic. □ 



Lemma 5.2 Let y be a stationary sequence with a bounded spectral density, then 

G{y)<^HM- (60) 

Proof : Assume that z G ^(y). Then there exists an AS such that z = limn,a^y. Applying the 
Wold decomposition we obtain 

z = lim a^y = lim a^y + lim a^y . (61) 

n— >oo n— >oo n— >oo 

By Lemma |5.H the PND part vanishes as n tends to infinity, thus z G -f^oo(y)- ^ 

Note that the statement holds in particular for PD processes with a singular spectrum, as in 
this case Sy{uj) = 0. The converse inclusion, i.e. i/oo(y) ^ ^(y); is in general not true. However, 
for stationary sequences with a finite dimensional remote future, we can state the following. 
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Theorem 5.1 Assume that y is a stationary sequence with a bounded spectral density and that 
dimi?oo(y) < oo- Then Hoo{y) = g{y). 

Proof : It is sufficient to show that Hac{y) ^ ^(y)- 

Let dim Hoo{y) = q- By assumption Hk{y) 2 i?oo(y) has dimension greater than or equal to q for 
ah k > 0. It follows that for any k, the random variables y(A; + 1), . . . ,y{k + q) must be linearly 
independent. For otherwise the q x q covariance matrix 



-'q •" 



E 



'yik + 1)" 




yik + 1)' 


y(^ + q)_ 




yik + q)_ 



(62) 



would be singular of rank r < q and hence, because of the Toeplitz structure, one would have 
rankS„ = r < q for all n > q, which implies that one can extract only r linearly independent 
random variables from an arbitrarily long string of random variables of the process. This in turn 
would imply dimi7oo(y) = r < q contrary to our assumption. Therefore 



span {y (A; + 1), . . . , y(A; + q')} D -ffoo(y) 
and for any z G Hooiy) there is a nonzero 6^ G M"? such that 

[yik + 1) 

yik + q) 



for all k 



(63) 



where y(A; + 1), . . . , y(A; + q) are the projections of y(A; + 1), . . . , yik + q) onto Hooiy). Fur- 
thermore, the Euclidean norm \\bk\\ is the same for all k because of stationarity. Hence, choosing 
k = 0, q,2q, . . . , in — l)q, one also has 



1 



n 



[6oT bj ... bZ_,0 ... 0]; 



On y 



(64) 



where the sequence {a„, n G Z+} is clearly an AS. It follows that 



a^y = lim a^y = lim a^y + lim a^y = z . 



where the last identity is a consequence of Lemma |5.1[ Therefore z G ^(y)- 
Hence, 



(65) 
□ 



Theorem 5.2 Every stationary sequence with a bounded spectral density and remote future space 
of dimension q is a q— factor sequence. It admits a unique generalized factor analysis representation 
([T7|) where y is the purely deterministic and y is the purely non- deterministic component of y. 
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Note in particular that the spectral density of y must necessarily satisfy the log-integrability con- 
dition. 

When Hooiy) is finite dimensional, the PD component of a stationary process has a special struc- 
ture, namely 



y{k) = Vj cosujik + Wj sinwj/c , 



(66) 



i=l 

where e^-^*^* , i = 1,2, ... ,q are the q eigenvalues of the unitary shift operator of the process 
The uji are distinct real frequencies in [0, vr) and Vj and Wj are mutually uncorrelated zero-mean 
random variables with var[vj] = var[wj] which span the subspace H{y) = Hoo{y)- 

In the following proposition, we show how to construct AS's that generate a basis in the finite- 
dimensional remote future space. 

Proposition 5.1 The latent factors of a stationary q-factor sequence can be recovered using aver- 
aging sequences {ai^n}nm of the type 



/, N f — sinijjjk k < n 
ai,n[k) = < n ~ 

I k > n 



(67) 



or 



— COS u!ik k < n 
n 

k> n 



(68) 



by letting uoi vary on the set of proper frequencies of the signal ()66p . 



Proof : Consider the AS {onlneN of ([67|) . with a fixed frequency Wj = ujp, p G {1, . . . , i^} and apply 
it to the sequence y. While the idiosyncratic (PND) part vanishes asymptotically, the g-aggregate 
(PD) component (|66p yields the sequence of random variables 



(69) 



Zn = aly = X] an{k)y{k) = — ^ |^sina;p/c ^(vj cosWjA; + Wjsinwj/c) 
k=i ^ k=l 1=1 

= — 1^ (vj sin ujpk cos A; + Wj sin copk sin LOik) + Vp sin ujpk cos ojpk + Wp sin^ Upk 

k=l i=l,iy^p 

It is well-known and not difficult to check directly, using elementary trigonometric identities such 
as sin a cos /3 = sin(Q + /3) + sin(a — /3) and the formula 



1 " 

n ^ 

k=l 



Jujk 



1 

n 



1 - e-''^" 
1 - eJ^ 



1 

< - 

n 



1 

sina;/2 
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that all time averages of products of sin and cos functions in this sum vanish asymptotically except 
for the sin'^ term, which has the limit 

1 " w 
lim — Wp sin^ Upk = — , (70) 
n— !>oo n ^ — ' 2 
k=l 

which is one of the latent factors. Similarly, the random variables Vj, associated with cosine- type 
oscillations, can be recovered using averaging sequences of the type (f68]) . □ 
Obviously one can obtain arbitrary linear combinations Yli=i ^i^i + di'^i by properly combining 
the AS's (l67l) and (|68D. 

Discussion 

We have shown that there is a natural interpretation of generalized factor analysis models in terms 
of the Wold decomposition of stationary sequences. A stationary sequence admits a generalized 
factor analysis representation if and only if its spectral density is bounded and the remote future 
space is finite dimensional. Both conditions are necessary since a PD stationary process has a finite 
factor representation if and only if its remote future has finite dimension. On the other hand there 
are stationary processes with a finite dimensional remote future space, whose PND component has 
an unbounded spectral density. It follows from Szego's theorem that S is an unbounded operator 
and these processes are neither aggregate nor idiosyncratic. 

In the classical papers [12^ I23j. stationarity with respect to the cross-sectional (space) index is 
not assumed. However without stationarity, there may be random sequences which fail to satisfy the 
eigenvalue conditions of Theorem 14. II and do not admit a generalized factor analysis representation. 
A precise characterization of which class of non-stationary sequences admits a GFA representation 
seems still to be an open problem. 

6 GFA models of Random Fields 

We come back to the question raised in section 11.61 namely when does a second order random field 
have a flocking component and how to extract it from sample measurements of y(k,t). A simple 
class of random fields for which this question can be answered positively is the class of separable 
space-time processes 

y{k,t)=v{k)u{t) (71) 

which are the product of a space, v(A:), and time component, u{t), both zero mean and with finite 
variance. This model can be generalized, for example making both v{k) and u(t) vector-valued but 
this would require extending our static theory in the preceding sections to vector-valued processes 
as well. Although this is quite straightforward involving no new concepts but just more notations, 
for the sake of clarity we shall restrain to the scalar case. 

The model (fTTll needs to be specified probabilistically, as the dynamics of the "time" process {u(t)} 
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may well be space dependent and dually, the distribution of v(A;) may be a priori time-dependent. 
The following assumption specifies in probabilistic terms the multiplicative structure (|7ip of the 
random field y{k,t). 

Assumption : The space and time evolutions of y{k, t) are multiplicatively uncorrelated in the 
sense that 

E{v(A;i)v(A;2) I u(ti)u(t2)} = E v{v(fci)v(A;2)} (72) 

where the first conditional expectation is made with respect to the conditional probability distri- 
bution of V given the random variables u{ti), u(t2), while the second expectation is with respect 
to the marginal distribution of v. 

From the multiplicative uncorrelation (|72p one gets 

E {v{h)v{k2)u{h)u{t2)} = E {v(fci)v(A;2)} E {u(ti)u(t2)} = ci^ikuk^) a^{h,t2) (73) 

where Cv and Cu are the covariance functions of the two processes. Hence the covariance function 
of the random field inherits the separable structure of the process. If v and u are jointly Gaussian, 
the multiplicative uncorrelation property follows if the two components are uncorrelated; namely 
their joint covariance is separable. This is a structure which is often assumed in the literature, see 
|33j and references therein. Assume now that the space process has a nontrivial GFA representation 
with q factors 

v(A:) = Mk)zi + v(A:) (74) 

i=l 

where v{k) := Y2ifii^)^i aggregate and v{k) the idiosyncratic component of v(A;). Then 

setting Xj(t) = Zju(t) and y{k, t) := v{k)u{t) one can represent the random field ()7ip by a dynamic 
GFA model, 

g 

y{k, t)=Y, fiik)Mt) + y{k, t) := y(A;, t) + y(fc, t) (75) 

i=l 

Proposition 6.1 If the processes v and u are multiplicatively uncorrelated then the two terms 
y{k,t) and y(h,s) in the GFA model (j75p are uncorrelated for all k,h and t,s. Hence a separable 
random field satisfying the multiplicative uncorrelation property has a flocking component if and 
only if its space process v has a nontrivial aggregate component. 

Proof : We have 

g 

E {y{k, t)yih, s)} = fiik) E {ziu(t)v(/i)u(s)} (76) 

i=l 

where the expectation in the last term can be written as 

E{ziv(/i)u(f)u(s)} = E{Ev[ziv(/i) I u(t)u(s)]u(t)u(s)} = E {E v[ziv(/i) ]u(t)u(s)} = (77) 
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since the Zj's are random variables in H{v) and v{h) is orthogonal to this space. The last statement 
then follows directly. □ 
Let now v be second-order weakly stationary satisfying the conditions of Theorem I5.2[ Here is 
probably the simplest nontrivial example of decomposition (j75p . 

Example 6.1 (Exchangeable space processes) Consider the case of a (weakly) exchangeable 
space process v; i.e. a process whose second order statistics are invariant with respect to all index 
permutations of locations {k,j). Clearly the covariances a-v{k,j) = Ev(A;)v(j) must be independent 
of k, j for k ^ j and a^ik, k) = a'^ > must be independent of k 12]. Letting p := av{k,j), k ^ j, 
one has 



p p p 
p p p 



(78) 



where o"^ > \p\ for positive definitness. Letting f denote an infinite column vector with components 
all equal to p, one can decompose Sv as 

Sv = //^ + (a' - P)I (79) 

where here L denotes an infinite identity matrix. This is a Factor Analysis decomposition of rank 
q = 1 of Sv with Sv a diagonal matrix. Hence a weakly exchangeable space process is a 1- factor 
process with an idiosyncratic component which is actually white. Ln the CFA representation (174p 
there is just one factor z and the factor loading vector f does not depend on the space coordinate. 
Consider a random field with the multiplicative structure (I7ip . then the flocking component 



y(A;,t) = /x(t), x(t) = zu(t) 
describes a constant, space independent, configuration moving randomly in time. 

6.1 Statistical estimation 

Assume that the space component of the random field is stationary and we have a snapshot of the 
system at certain time ioi that is we have observations of a "very large" portion of the process 
{y(fe, to), A; = 1, 2, ... , N} at some fixed time to- With these sample data we may form the sample 
covariance estimates 

1 ^ 1 ^ 

GNiK to) ■=J^Y1 + ^' ^o)2^(^' *o) = ]y + ^)^(^) ^(*o)^ ' /i = 0, 1, 2, . . . (80) 

k=l k=l 

which also have the multiplicative structure a^ih, to) = (Tv, Ar(/i)ti(to)^, where a^^Nih) is the sample 
covariance estimate of the v process based on data. Now by the assumptions made on the space- 
process V the limit limTV— ^oo (7Ar(/i,to) exists (although it may be sample dependent for the PD 
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part), so the sample matrix covariance estimate, which has the form 



(TAr(0,to) (Tiv(l,to) 

aN{N-i,to) 



aN{N-l,to)' 
aN{N-2,to) 

(TAr(l,to) 0-Ar(0,to) 



n(io)'Sv,^ (81) 



will converge to a limit for N ^ oo. 

Following |12[I23| the idea is now to do PC A on the covariance estimate for increasing N and isolate 
q eigenvalues which tend to grow without bound as — t- oo while the others stay bounded. The 
q corresponding eigenvectors will tend as A — )• oo to the q factor loadings fi, . . . , fq and therefore 
provide asymptotically the F.A. decomposition of the Sv matrix 

= FF^ + Sv . 

After F and Sv are estimated, the stochastic realization procedure of Sect. H] permits to construct 
the factor vector z and the idiosyncratic component v of the GFA representation of v as in (I74p . 
The reconstruction of the time varying factor variables Xj(t) = Zju(f) of y from the observations 
y[k,t) = v(A;)u(t) can be done, in several equivalent ways, by averaging on the space variable. 



7 Conclusions 

We have proposed a new modeling paradigm for large dimensional aggregates of random systems by 
the theory of Generalized Factor Analysis models. We have discussed static GFA representations 
and characterized in a rigorous way the properties of the aggregate and idiosyncratic components 
of these models. For wide-sense stationary sequences the character and existence of these models 
has been completely clarified in the light of the Wold decomposition. The extraction of the flocking 
component of a random field has been discussed for a simple class of separable random fields. 
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